Nomilo Fishpond Biogeochemical Analysis
  • Correlational Analysis
  • Fieldwork Templates

Data Analysis Workflow:

  • Introduction
    • Research Questions
  • Install Packages
  • Load Libraries
  • Import Raw Data
    • Procedure
    • View Raw Data
  • Tidy Raw Data
    • Tidying Processes
    • Merge Tidied Datasets
  • Processed Datasets
    • Export Tidied Datasets
    • Data Dictionaries
  • Exploratory Data Analysis
    • Descriptive Statistics
    • Paired Line Plots: Oyster & Clam
    • Line Plots: Weather & KSF Data
  • Correlational Analysis

Nomilo Fishpond Biogoechemical Analysis

Author

Lysbeth Koster

Published

February 28, 2024

Introduction

[Explain the purpose of this website]

Research Questions

Interactive Code

Throughout this document, hover over the numbered annotations to the right of code chunks to reveal detailed explanations and comments about the code. Where drop-down italicized text is present, expand by pressing on arrow to see code.

Install Packages

packages <- c("rio", "tidyverse", "janitor", "lubridate", "rmarkdown", "fs", 
              "hms", "zoo", "corrplot", "kableExtra", "psych", "ggplot2", "shiny", 
              "rsconnect", "packrat", "shinylive", "shinycssloaders", "pak", 
              "plotly", "here", "gtsummary")


for (pkg in packages) {
  if (!requireNamespace(pkg, quietly = TRUE)) {
    install.packages(pkg)
  }
}

Load Libraries

library(rio)
library(tidyverse)
library(janitor)
library(lubridate)
library(rmarkdown)
library(fs)
library(hms)
library(zoo)
library(corrplot)
library(kableExtra)
library(psych)
library(ggplot2)
library(shiny)
library(rsconnect)
library(packrat)
library(pak)
library(shinylive)
library(shinycssloaders)
library(stringr)
library(plotly)
library(here)
library(gtsummary)
1
For importing excel data
2
For cleaning of data
3
For cleaning variable names
4
For cleaning dates
5
For displaying tables
6
For file path usage

Import Raw Data

Procedure

Define vector of files to import:

create_vector_file_paths <- function(directory_path) {
  # List all files in the given directory path
  files_to_import <- fs::dir_ls(path = directory_path)
  
  # Loop through the files and print each with an index
  for (i in seq_along(files_to_import)) {
    cat(i, "= ", files_to_import[i], "\n")
  }
  
  # Return the vector of file paths
  return(files_to_import)
}

files_to_import <- create_vector_file_paths("data/raw")
1 =  data/raw/2024-02-28_dfs.RData 
2 =  data/raw/2024-02-28_ksf-clam-growth.xlsx 
3 =  data/raw/2024-02-28_ksf-compiled-data.xlsx 
4 =  data/raw/2024-02-28_ksf-oyster-cylinder-growth.xlsx 
5 =  data/raw/2024-02-28_profile-data.xlsx 
6 =  data/raw/2024-02-28_water-samples.xlsx 
7 =  data/raw/2024-02-28_weather-data.xlsx 
8 =  data/raw/2024-03-01_dfs-no-profiles.RData 
9 =  data/raw/2024-03-01_dfs_no_profiles.RData 
10 =  data/raw/2024-03-04_dfs-no-profiles.RData 
11 =  data/raw/2024-03-08_dfs-no-profiles.RData 
12 =  data/raw/2024-03-18_dfs-no-profiles.RData 
  1. Store the file paths of our raw data within the data/raw directory in files_to_import
  2. Print each file path with its index

Use the purrr::map() function to iteratively import files in the files_to_import vector except for the profiles data and .RData files:

@iteratively-import-raw-data Code Chunk Execution Warning

The @iteratively-import-raw-data code chunk should only be ran once when raw data is updated because it takes long to execute. Therefore, run the @efficiently-load-raw-data code chunk instead to easily import up-to-date raw data.

dfs_no_profiles <- map(files_to_import[c(2:4, 6, 7)], import_list)
current_date <- format(Sys.Date(), "%Y-%m-%d")
save(dfs_no_profiles, file = paste0("data/raw/", current_date, "_dfs-no-profiles.RData"))
Ensure the Correct Index Value is Inputted Below

Refer to the output of the files_to_import data object to ensure you are inputting the correct index value corresponding to the file path that needs to be loaded.

Efficiently import up-to-date raw data:

load(files_to_import[10])

Rename datasets:

We will always use snakecase when naming our data objects and functions (e.g., data_object_name or function_name()).

names(dfs_no_profiles) <- gsub("data/raw/2024-02-28_|\\.xlsx$|\\.xls$", "", 
                               files_to_import[c(2:4, 6, 7)])
names(dfs_no_profiles) <- gsub("-", "_", names(dfs_no_profiles))
names(dfs_no_profiles)
1
Remove prefixes and file extensions
2
Replace hyphens with underscores
3
Check if names were outputted correctly
[1] "ksf_clam_growth"            "ksf_compiled_data"         
[3] "ksf_oyster_cylinder_growth" "water_samples"             
[5] "weather_data"              

Rename each sheet within each raw dataset to be lowercased and replace spaces with underscores:

dfs_no_profiles <- map(dfs_no_profiles, ~ set_names(.x, gsub(" ", "_", tolower(names(.x)))))

Create separate datasets by specifying the Excel sheet from each spreadsheet we want to tidy:

ksf_clams_growth_data <- dfs_no_profiles$ksf_clam_growth$sheet1
ksf_compiled_data <- dfs_no_profiles$ksf_compiled_data$full_data
ksf_oyster_cylinder_growth_data <- dfs_no_profiles$ksf_oyster_cylinder_growth$sheet1
water_samples_data <- dfs_no_profiles$water_samples$data_overview
weather_data <- dfs_no_profiles$weather_data$weather_ksf
tidal_data <- dfs_no_profiles$ksf_compiled_data$tides

We want to combine multiple sheets within the profiles Excel spreadsheet into one, therefore, we will import it separately:

sheets_to_import <- c("L1", "L2", "L3", "L4")

profiles_data <- profiles_data <- map_dfr(sheets_to_import, function(sheet_name) {
  import(files_to_import[5], which = sheet_name)
}) %>%
  bind_rows()
1
[code annotation]
2
[code annotation]
3
[code annotation]

View Raw Data

  • ksf_clams_growth_data
  • ksf_compiled_data
  • ksf_oyster_cylinder_growth_data
  • water_samples_data
  • weather_data
  • profiles_data
  • tidal_data

Tidy Raw Data

Tidying Processes

  • ksf_clams_growth_data_tidied
  • ksf_compiled_data_tidied
  • ksf_oyster_cylinder_growth_data_tidied
  • water_samples_data_tidied
  • weather_data_tidied
  • profiles_data_tidied
  • tidal_data_tidied
Steps to clean data
new_clam_var_names <- c(
  "sort_date", "color", "clams_in_count", "clams_in_lbs",  "clams_in_avg_per_lb",
  "clams_out_count", "clams_out_lbs", "clams_out_avg_per_lb", "growth_in_lbs", 
  "growth_pct", "sr", "days_btwn_sort"
  )

new_clam_date_col <- c(
  "2023-10-17", "2023-12-06", "2023-12-12", "2024-01-02",  "2024-01-10", "2024-01-24",
  "2024-01-31", "2024-02-08", "2024-02-13"
  )

ksf_clams_growth_data_tidied <- ksf_clams_growth_data %>%
  slice(-1) %>%
  setNames(new_clam_var_names) %>%
  mutate(date = as.Date(new_clam_date_col)) %>%
  dplyr::select(-sort_date) %>%
   pivot_longer(
    cols = c(
      clams_in_count, clams_in_lbs, clams_in_avg_per_lb,   clams_out_count, 
      clams_out_lbs, clams_out_avg_per_lb
      ),
    names_to = c("stage", ".value"),
    names_prefix = "clams_",
    names_sep = "_",
    values_to = "value"
  ) %>%
  mutate(stage = if_else(str_detect(stage, "in"), "In", "Out")) %>%
  rename(avg_per_lbs = avg) %>%
  mutate(across(c(color, stage), as.factor)) %>%
  mutate(across(c(count, lbs, avg_per_lbs, growth_in_lbs, growth_pct, sr),
                ~as.numeric(gsub("%", "", .)))) %>%
  arrange(date, color, stage) %>%
  dplyr::select(date, days_btwn_sort, color, stage, count, lbs, avg_per_lbs,
                growth_in_lbs, growth_pct, sr) %>%
  rename("days_btwn_clams_sort" = days_btwn_sort,
         "clams_color" = color,
         "clams_stage" = stage,
         "clams_count" = count,
         "weight" = lbs,
         "avg_weight" = avg_per_lbs,
         "clams_growth" = growth_in_lbs,
         "clams_sr" = sr, 
          "date_out" = date
         ) %>% 
    mutate(date_in = date_out - days_btwn_clams_sort) %>%
relocate(c(date_in, date_out), .before = days_btwn_clams_sort) %>% 
  mutate(
    date = case_when(
      clams_stage == 'In' ~ date_in,
      clams_stage == 'Out' ~ date_out
    )
  ) %>%
  select(-date_in, -date_out) %>% 
   mutate(grouping_variable = rep(1:(n() / 2), each = 2))


paged_table(ksf_clams_growth_data_tidied)
1
Manually set variable names
2
Assign dates to new date column
3
Delete first row
4
Set date as correct variable type and pivot data set based on date range.
5
Assign In and Out to stage
6
Rename variable of average to average per lbs
7
Set stage and color as factor variable types
8
Set variables as numeric variable types
9
Arrange values by date, color, and stage
Steps to clean data
ksf_compiled_data_tidied <- ksf_compiled_data %>% 
  rename_with(~gsub("\\s*\\([^\\)]+\\)", "", .x)) %>%
  janitor::clean_names() %>%
  rename(date = date_time) %>%
  mutate(date = as.Date(date)) %>%
  filter(date >= as.Date("2023-11-20") & date <= as.Date("2024-02-20")) %>%
  arrange(date) %>%
  dplyr::select(-c(external_voltage, wk_num, wind_dir,
                   spadd, outdoor_temperature, hourly_rain,
                   solar_radiation, resistivity, battery_capacity,
                   hour, daynum, data_pt, wind_sp, diradd,
                   wind_speed, wind_direction, tide, day, month, year)
                ) %>%
  dplyr::select(where(~ !anyNA(.))) %>%
  group_by(date) %>%
  summarise(across(where(is.numeric), \(x) mean(x, na.rm = TRUE))) %>%
  rename("ksf_salinity" = salinity,
         "ksf_rdo_saturation" = rdo_saturation,
         "ksf_rdo_concentration" = rdo_concentration,
         "ksf_actual_conductivity" = actual_conductivity,
         "ksf_total_dissolved_solids" = total_dissolved_solids,
         "ksf_ammonium" = ammonium,
         "ksf_barometric_pressure" = barometric_pressure,
         "ksf_oxygen_partial_pressure" = oxygen_partial_pressure,
         "ksf_specific_conductivity" = specific_conductivity,
         "ksf_density" = density,
         "ksf_chlorophyll_a_fluorescence" = chlorophyll_a_fluorescence,
         "ksf_ammonium_m_v" = ammonium_m_v)

paged_table(ksf_compiled_data_tidied)
1
Clean variable names by removing everything in parentheses, using lowercase and underscores in place of spaces
2
Rename the date_time variable to date, filter to desired date range and sort by date
3
Remove unnecessary variables
4
Remove columns with containing all NA values
5
Group by date and calculate the average of every variable for each day
Steps to clean data
oyster_var_names <- c(
  "date", "oyster_large_weight", "oyster_large_gain", "oyster_small_weight",
  "oyster_small_gain", "oyster_chlorophyll"
  )

ksf_oyster_cylinder_growth_data_tidied <- ksf_oyster_cylinder_growth_data %>% 
  dplyr::select(c(1, 4, 5, 8, 9, 12)) %>%
  slice(-1) %>%
  setNames(oyster_var_names) %>%
  pivot_longer(
    cols = c(oyster_large_weight, oyster_large_gain,
             oyster_small_gain,
             oyster_small_weight),
    names_to = c("oyster_size", ".value"),
    names_prefix = "oyster_",
    names_sep = "_",
    values_to = "value"
  ) %>%
  mutate(oyster_size = if_else(str_detect(oyster_size, "small"), "Small", "Large")) %>%
  mutate(date = as.Date(date),
         oyster_size = as.factor(oyster_size),
         across(c(weight, gain), as.numeric)
        ) %>%
  filter(date >= as.Date("2023-11-20") & date <=
           as.Date("2024-02-14")) %>%
  mutate(weight = weight * 0.00220462) %>% 
  rename("growth_pct" = gain)

paged_table(ksf_oyster_cylinder_growth_data_tidied)
1
Manually set variable names
2
Select desired columns and remove first row
3
Convert from wide to long format
4
Create a new variable that differentiates oyster size
5
Adjust data types to numeric and factor
6
Filter to desired date range
Steps to clean data
water_samples_data_tidied <- water_samples_data %>%
  slice(-c(44:52)) %>% 
  rename_with(~gsub("\\s*\\([^\\)]+\\)", "", .x)) %>%
  janitor::clean_names() %>%
  mutate(
    date = if_else(date == "44074",
            as.character(as.Date("2024-01-09")),
            format(dmy(date), "%Y-%m-%d"))
  ) %>%
  mutate(sample_id = 1:nrow(.)) %>%
  mutate(date = as.Date(date),
         across(c(nomilo_id, location, round, depth), as.factor)) %>%
  select(-c(sample_id, nomilo_id, tube_name)) %>% 
  mutate(location = str_to_title(location)) %>%
  mutate(location = factor(location, 
                           levels = c("Mid Buoy", "Back Buoy", "Auwei", "Production Dock"))) %>% 
   filter(!is.na(location))

paged_table(water_samples_data_tidied)
1
Clean variable names by removing everything in parentheses, using lowercase and underscores in place of spaces
2
Replaces incorrect date values and format as YYYY-MM-DD
3
Add values for sample ID
4
Set correct variable types
Steps to clean data
weather_data_tidied <- weather_data %>% 

janitor::clean_names() %>%
   unite(date, year, month, day, sep = "-") %>%
  mutate(date = ymd(date)) %>%
   select(-c(1, 3)) %>%
  rename("outdoor_temperature" = outdoor_temp_f) %>%
   mutate(outdoor_temperature = (outdoor_temperature - 32) * (5/9)) %>%
  group_by(date) %>%
  summarise(across(where(is.numeric), \(x) mean(x, na.rm = TRUE))) %>%
  slice(-1)

paged_table(weather_data_tidied)
1
Clean variable names
2
Merge separate day, month, year columns into one column variable and format as YYYY-MM-DD.
3
Cut columns
4
Rename outdoor temperature and convert from Fahrenheit to Celcius
5
Group by date and then take average values per day
6
Cut first row
Steps to clean data
new_profile_var_names <- c("depth", "water_temperature", "dissolved_oxygen", "salinity", "conductivity", "visibility", "location", "date")

profiles_data_tidied <- profiles_data %>%
  select(-c(6, 8)) %>%
  mutate(
    temp_column1 = NA_character_,
    temp_column2 = NA_character_
  ) %>%
  setNames(new_profile_var_names) %>%
  mutate(
    location = ifelse(depth == "Location", water_temperature, NA_character_), 
    date = ifelse(depth == "Date",  water_temperature, NA_character_)
  ) %>%
  fill(location, date, .direction = "down") %>%
  filter(depth != "Location", depth != "Date") %>%
  mutate(
    location = case_when(
      location == "L1 Northwest buoy" ~ "back buoy",
      location == "L2 Middle Buoy" ~ "mid buoy",
      location == "L3 Production Dock" ~ "production dock",
      location == "L4 Auwai" ~ "auwei",
      TRUE ~ location
    ),
    date = case_when(
      date %in% c("45258", "2023-11-28") ~ "2023-11-28",
      date %in% c("45282", "2023-12-21") ~ "2023-12-21",
      date %in% c("45536", "2024-01-09") ~ "2024-01-09",
      date %in% c("30/1/24", "30/01/24") ~ "2024-01-30",
      date %in% c("20/02/24", "20/2/24") ~ "2024-02-20",
      TRUE ~ date
    )) %>%
  mutate(
    date = as.Date(date, format = "%Y-%m-%d"),
     conductivity = case_when(
      row_number() %in% c(1:11, 53:62, 128:133, 159:161) ~ NA_character_,
      TRUE ~ as.character(conductivity)
    )
  ) %>%
  filter(!(depth %in% c("Samples", "Depth"))) %>%
  mutate(date = as.Date(date),
         across(c(depth, location), as.factor),
         across(c(water_temperature,  dissolved_oxygen, salinity, 
                  conductivity,visibility),  as.numeric)) %>%
   fill(visibility, .direction = "down") %>%
  mutate(visibility = if_else(date == "2023-11-28",  NA_real_, visibility)) %>%
  mutate(location = str_to_title(location)) %>%
  mutate(location = factor(location, 
                           levels = c("Mid Buoy", "Back Buoy", "Auwei", "Production Dock")))

paged_table(profiles_data_tidied)
1
Set new variable names manually
2
Delete unnecessary columns
3
Temporarily create two new columns to replace the ones we deleted
4
Assign new profile variable names to rename variables in data set
5
Takes location from one column of location and date data, and assigns it to corresponding data of another column.
6
Fill values of temperature downwards in newly created date and location column.
7
Gets rid of the ‘location’ and ‘date’ rows that had empty values.
8
Renames values
9
Removes turbidity data rows 1:11
10
Looks for rows containing samples and depth names and negate these values.
11
Sets correct data types for each variable
12
Fills values from the temperature downwards into the newly created columns for date and location
Steps to clean data
tidal_data_tidied <- tidal_data %>% 

janitor::clean_names() %>%
  mutate(date = as.Date(date, format = "%Y-%m-%d"))  %>%
   filter(date >= as.Date("2023-11-20") & date <=  as.Date("2024-02-20")) %>%
  select(-2) %>%
  mutate(time = as_hms(format(time, "%H:%M:%S")),
         high_low = as.factor(high_low))
  
paged_table(tidal_data_tidied)
1
Clean variable names
2
Set date as correct variable type and format YYYY-MM-DD
3
Filter to desired date range
4
Cut column
5
Set time as time variable type
6
Set variable as factor type

Merge Tidied Datasets

  • Biogeochemical & Physical Variables
  • Clams
  • Oysters
# Merging
profiles_water_samples_merged <- reduce(list(profiles_data_tidied, water_samples_data_tidied), full_join, by = c("date", "location", "depth")) %>% 
  relocate(date, round, location, depth, .before = water_temperature) %>%
  arrange(date) %>% 
  fill(round, .direction = "down") %>%
  mutate(round = if_else(is.na(round), "1", round),
         round = as.factor(round))

# Complete dataset
compiled_weather_merged <- reduce(list(ksf_compiled_data_tidied, weather_data_tidied), full_join, by = "date")

# Final dataset
biogeochem_vars_merged <- full_join(compiled_weather_merged, profiles_water_samples_merged)


paged_table(biogeochem_vars_merged)
Clams Growth Merged with Environmental Variables
# Merging
clams_growth_biogeochem_vars_merged <- full_join(ksf_clams_growth_data_tidied, biogeochem_vars_merged, by = "date")
Oyster Growth Interpolated and Merged with Environmental Variables
oyster_growth_biogeochem_vars_merged <- full_join(ksf_oyster_cylinder_growth_data_tidied, biogeochem_vars_interp, by = "date")

# Interpolating -- other option is to aggregate to weekly or monthly, but dates 
# are very mismatched to aggregate to monthly and dataset would be very small if 
# aggregated to monthly

oyster_growth_biogeochem_vars_interp <- oyster_growth_biogeochem_vars_merged %>% 
 mutate(across(where(~is.numeric(.x) && any(is.na(.x))),
                ~na.approx(.x, na.rm = FALSE, rule = 2))) %>% 
 # relocate(date, round, location, depth, clams_color, clams_stage, .before = days_btwn_clams_sort) %>% 
  arrange(date)

Processed Datasets

Export Tidied Datasets

Export tidied datasets to CSV into data/tidied folder:

source("code/functions/export_to_csv.R")

dfs_to_export <- list(
  ksf_clams_growth_data_tidied = ksf_clams_growth_data_tidied,
  ksf_compiled_data_tidied = ksf_compiled_data_tidied,
  ksf_oyster_cylinder_growth_data_tidied = ksf_oyster_cylinder_growth_data_tidied,
  water_samples_data_tidied = water_samples_data_tidied,
  profiles_data_tidied = profiles_data_tidied,
  biogeochem_vars_merged = biogeochem_vars_merged,
  profiles_water_samples_merged = profiles_water_samples_merged,
  tidal_data_tidied = tidal_data_tidied,
  weather_data_tidied = weather_data_tidied
  )

imap(dfs_to_export, ~ export_to_csv(.x, .y, "data/tidied"))
1
List of dataframes we want to export as CSV files
2
Iterate the export_to_csv(df, df_name, dir_path) function over each dataframe. .x refers to the dataframe. .y refers to the name of the dataframe. These are passed to export_to_csv() function along with the desired directory path.

Export merged final data set into data/outputs folder.

Data Dictionaries

Exploratory Data Analysis

Descriptive Statistics

create_summary_table <- function(df) {
  if (is.data.frame(df)) {
    df %>%
      dplyr::select(where(is.numeric)) %>%
      describe() %>%
      dplyr::select(mean, sd, min, max)
  } else {
    cat("Warning: Input is not a data frame. Skipping...\n")
    return(NULL)
  }
}

# Define your list of data frames
summary_dfs <- list(
  ksf_clams_growth_data_tidied,
  ksf_oyster_cylinder_growth_data_tidied,
  ksf_compiled_data_tidied,
  profiles_data_tidied,
  water_samples_data_tidied,
  weather_data_tidied
)

# Apply the create_summary_table function to each data frame in the list
summary_tables <- map(summary_dfs, create_summary_table)


# Display the resulting summary tables
summary_tables <- set_names(summary_tables, summary_dfs)
summary_tables
$...
                         mean       sd     min      max
days_btwn_clams_sort    43.00    17.87   22.00    85.00
clams_count          14408.98 14032.23 2303.28 50939.30
weight                  69.76    62.38   14.67   231.96
avg_weight             367.63   380.13   25.88  1452.80
clams_growth            16.79    33.68  -40.63    78.94
growth_pct               1.08     1.84   -0.56     5.38
clams_sr                 1.10     0.38    0.32     1.50
grouping_variable        5.00     2.66    1.00     9.00

$...
                    mean    sd  min   max
oyster_chlorophyll  1.96  0.71 0.59  2.91
weight             19.69 23.18 1.06 56.53
growth_pct          0.10  0.08 0.02  0.28

$...
                                   mean      sd      min      max
ksf_rdo_concentration              6.00    0.66     4.33     7.93
ksf_rdo_saturation                85.80    9.89    61.27   114.64
ksf_oxygen_partial_pressure      132.95   15.30    95.03   177.49
ksf_actual_conductivity        46822.44 2850.45 39536.55 51119.48
ksf_specific_conductivity      47232.35 2506.50 39452.18 50269.78
ksf_salinity                      31.20    1.85    25.59    33.48
ksf_density                        1.02    0.00     1.02     1.02
ksf_total_dissolved_solids        30.70    1.63    25.64    32.68
ksf_chlorophyll_a_fluorescence     2.04    0.91     0.44     4.18
ksf_ammonium                       7.20    1.66     3.39     9.95
ksf_ammonium_m_v                  56.60    5.97    39.64    65.14
ksf_barometric_pressure            0.68    2.67     0.02    17.26

$...
                      mean      sd      min     max
water_temperature    24.20    0.90    22.70    25.6
dissolved_oxygen     22.33   32.09     0.34   101.0
salinity          52481.36 1156.87 48300.00 52966.0
conductivity         48.18    4.95    38.10    52.9
visibility            4.72    0.70     4.00     6.0

$...
                                    mean         sd        min        max
chlorophyll_a                       3.08       1.94       0.11       8.25
phosphate                           0.11       0.09       0.03       0.51
silicate                           17.85       8.05       0.93      28.90
nitrate_nitrite                     1.23       0.83       0.35       5.67
ammonia                             1.27       2.38       0.02      12.45
heterotrophic_bacteria        5570496.97 1239586.12 1542116.67 7871466.67
large_phytoplankton             68584.34   46366.26   11400.00  161933.33
synechococcus_population_1     990845.96  672703.86  112050.00 1897200.00
synechococcus_population_2     140281.31  135451.59   16583.33  408900.00
prochlorococcus                 91638.38   72203.62   24150.00  349666.67
lysbeths_mystery_cells_events   23904.55   20264.02    2133.33   75033.33

$...
                      mean     sd   min    max
outdoor_temperature  22.49   1.28 19.28  25.03
wind_speed_mph        3.90   1.43  1.10   7.24
hourly_rain_inch_hr   0.00   0.01  0.00   0.10
wind_direction      180.49 107.66 28.25 341.75

Paired Line Plots: Oyster & Clam

paired_plot <- ggplot(ksf_clams_growth_data_tidied, aes(x = date, y = weight, group = grouping_variable, color = clams_stage)) +
  geom_line() +  # Connect paired observations with lines
  geom_point(size = 3) +  # Add points for each observation
  labs(x = "Date", y = "Weight", color = "Stage") +  # Label axes and legend
  scale_color_manual(values = c("In" = "blue", "Out" = "red")) +  # Custom color scale
  theme_minimal()  # Apply a minimal theme

paired_plot

Line Plots: Weather & KSF Data

p <- ggplot(ksf_compiled_data_tidied, aes(x = date, y = ksf_rdo_saturation)) +
  geom_line(linewidth = 2, alpha = 0.6) +  # Adjust line size and alpha
  labs(x = "Month", y = "KSF RDO Saturation") +  # Update axis labels
  scale_x_date(date_labels = "%b %Y") +  # Format x-axis date labels
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 1, size = 14),
    axis.text.y = element_text(margin = margin(r = 10), size = 14),
    axis.title.x = element_text(margin = margin(t = 20), face = "bold", size = 18),
    axis.title.y = element_text(face = "bold", margin = margin(r = 5), size = 18),
    plot.title = element_text(hjust = 0.5, face = "bold", size = 22),
    legend.position = "bottom",
    legend.title = element_text(face = "bold", size = 14),
    legend.text = element_text(size = 14),
    plot.margin = margin(20, 20, 20, 20),
    panel.spacing = unit(1.5, "lines")
  )
  # ylim(0, 1)
p

Correlational Analysis

  • Clams
  • Oysters

All Relationships


Specific Relationship

Correlation Between Interpolated Weight and Interpolated Dissolved Oxygen
results <- corr.test(x = clams_growth_biogeochem_vars_interp$clams_count, 
                     y = clams_growth_biogeochem_vars_interp$dissolved_oxygen,
                     method = "pearson", ci = TRUE)

print(results, short=FALSE)
Adjusted P-Value
adjusted_p_values <- results$p.adj
print(adjusted_p_values)

All Relationships

Specific Relationship

Back to top